Picture for Zhenhua Han

Zhenhua Han

Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses

Add code
Apr 28, 2026
Viaarxiv icon

EVPO: Explained Variance Policy Optimization for Adaptive Critic Utilization in LLM Post-Training

Add code
Apr 21, 2026
Viaarxiv icon

MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning

Add code
Apr 15, 2026
Viaarxiv icon

Building Self-Evolving Agents via Experience-Driven Lifelong Learning: A Framework and Benchmark

Add code
Aug 26, 2025
Viaarxiv icon

Efficient Serving of LLM Applications with Probabilistic Demand Modeling

Add code
Jun 17, 2025
Viaarxiv icon

Speech-Language Models with Decoupled Tokenizers and Multi-Token Prediction

Add code
Jun 14, 2025
Figure 1 for Speech-Language Models with Decoupled Tokenizers and Multi-Token Prediction
Figure 2 for Speech-Language Models with Decoupled Tokenizers and Multi-Token Prediction
Figure 3 for Speech-Language Models with Decoupled Tokenizers and Multi-Token Prediction
Figure 4 for Speech-Language Models with Decoupled Tokenizers and Multi-Token Prediction
Viaarxiv icon

Efficient Unified Caching for Accelerating Heterogeneous AI Workloads

Add code
Jun 14, 2025
Viaarxiv icon

Real-Time Neural-Enhancement for Online Cloud Gaming

Add code
Jan 12, 2025
Figure 1 for Real-Time Neural-Enhancement for Online Cloud Gaming
Figure 2 for Real-Time Neural-Enhancement for Online Cloud Gaming
Figure 3 for Real-Time Neural-Enhancement for Online Cloud Gaming
Figure 4 for Real-Time Neural-Enhancement for Online Cloud Gaming
Viaarxiv icon

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

Add code
Sep 16, 2024
Figure 1 for RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Figure 2 for RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Figure 3 for RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Figure 4 for RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Viaarxiv icon

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Add code
Jul 02, 2024
Figure 1 for MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Figure 2 for MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Figure 3 for MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Figure 4 for MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Viaarxiv icon